On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
نویسندگان
چکیده
ÐSpatial queries in high-dimensional spaces have been studied extensively recently. Among them, nearest-neighbor queries are important in many settings, including spatial databases (Find the k closest cities) and multimedia databases (Find the k most similar images). Previous analyses have concluded that nearest-neighbor search is hopeless in high dimensions due to the notorious acurse of dimensionality.o Here, we show that this may be overpessimistic. We show that what determines the search performance (at least for R-tree-like structures) is the intrinsic dimensionality of the data set and not the dimensionality of the address space (referred to as the embedding dimensionality). The typical (and often implicit) assumption in many previous studies is that the data is uniformly distributed, with independence between attributes. However, real data sets overwhelmingly disobey these assumptions; rather, they typically are skewed and exhibit intrinsic (afractalo) dimensionalities that are much lower than their embedding dimension, e.g., due to subtle dependencies between attributes. In this paper, we show how the Hausdorff and Correlation fractal dimensions of a data set can yield extremely accurate formulas that can predict the I/O performance to within one standard deviation on multiple real and synthetic data sets. The practical contributions of this work are our accurate formulas, which can be used for query optimization in spatial and multimedia databases. The major theoretical contribution is the adeflationo of the dimensionality curse: Our formulas and our experiments show that previous worst-case analyses of nearest-neighbor search in high dimensions are overpessimistic to the point of being unrealistic. The performance depends critically on the intrinsic (afractalo) dimensionality as opposed to the embedding dimension that the uniformity and independence assumptions incorrectly imply. Index TermsÐNearest-neighbor search, multimedia indexing, fractals.
منابع مشابه
O-17: Female Genital Mutilation: A Curse or Blessing among Women of Reproductive Age in Nigeria
Background: Female genital mutilation (FGM) practice is mostly carried out by traditional circumcisers, who often play other central roles in communities, such as attending childbirths. Increasingly, FGM is also performed by health care providers. However, FGM is recognized internationally as a violation of the human rights of girls and women. The study investigates a broad cross-cultural study...
متن کاملLow Complexity Gaussian Latent Factor Models and a Blessing of Dimensionality
Learning the structure of graphical models from data usually incurs a heavy curse of dimensionality that renders this problem intractable in many real-world situations. The rare cases where the curse becomes a blessing provide insight into the limits of the efficiently computable and augment the scarce options for treating very under-sampled, high-dimensional data. We study a special class of G...
متن کاملVisual Recognition using Embedded Feature Selection for Curvature Self-Similarity
Category-level object detection has a crucial need for informative object representations. This demand has led to feature descriptors of ever increasing dimensionality like co-occurrence statistics and self-similarity. In this paper we propose a new object representation based on curvature self-similarity that goes beyond the currently popular approximation of objects using straight lines. Howe...
متن کاملGPU Accelerated Self-join for the Distance Similarity Metric
The self-join finds all objects in a dataset within a threshold of each other defined by a similarity metric. As such, the self-join is a building block for the field of databases and data mining, and is employed in Big Data applications. In this paper, we advance a GPU-efficient algorithm for the similarity self-join that uses the Euclidean distance metric. The search-and-refine strategy is an...
متن کاملNatural Resources, Institutions Quality, and Economic Growth; A Cross-Country Analysis
Abstract[1] Natural resources as a source of wealth can increase prosperity or impede economic growth. Empirical studies with different specifications and data are also mixed on whether natural resources are curse or blessing. In fact, the variety of model specifications, measurements, and samples in the empirical literature makes it difficult to generalize the results. In this study, a growth...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Knowl. Data Eng.
دوره 13 شماره
صفحات -
تاریخ انتشار 2001